Part 1. Probability Theory (part a)
You will see a lot of problems about coin flips and selecting balls from urns.
What does this have to do with social science?
Sometimes a coin flip, an urn, or a similar device actually determines which units/observations we see: who gets selected for a survey.
The urn problems help us understand how the sample might differ from the population (and thus how certain we can be about characteristics of the population using the sample).
Sometimes a coin flip, an urn, or a similar device actually determines which units/observations get a random treatment, e.g. in a randomized experiment or the Vietnam draft lottery.
The urn problems help us compare differences we see between treatment and control units to differences we might see by chance if the treatment had no effect.
Even when there was no random selection (e.g. data on all countries) we can act as if there was, or act as if the dependent variable (e.g. revolution) has a random component.
Then the urn problems again help us compare the “sample” to the “population”, or observed reality to what might have happened in an alternate history if we treat our ignorance as chance.
In probability problems, we know what’s in the urn and we want to describe the possible draws.
In many statistics problems, we have one draw and we want to speculate what might be in the urn (i.e. population).
A random generative process is a repeatable mechanism that can select an outcome from a set of possible outcomes.
Each draw or realization of the process may be uncertain (to the typical observer), but the frequency of each event can be described.
e.g. flipping a coin, rolling a die, drawing a ball from an urn.
Frequentist definition of probability: The probability of an event (e.g. “green ball is chosen”) is the proportion of many, many draws producing that event.
Bayesian definition of probability: The probability of an event is an observer’s degree of belief that the event will happen or has happened. Logical and subjective variants.
Sample space \(\Omega\) (“Omega”) is the set of all possible outcomes of the random generative process. Each element \(\omega\) (“omega”) is a unique outcome of the process.
For a coin flip, \(\Omega = \{H, T\}\); \(\omega \in \{H, T\}\).
For a single roll of a six-sided die, \(\Omega = \{1, 2, 3, 4, 5, 6\}\).
How about for a single roll of two six-sided die?
\[\Omega = \{(x, y) \in \mathbb{Z}^2 : 1 \leq x \leq 6, 1 \leq y \leq 6 \}\] (“Set-builder notation”, used w/o explanation at A&M p. 5)
Easier examples:
\[\Omega = \{(x, y) \in \mathbb{Z}^2 : 1 \leq x \leq 6, 1 \leq y \leq 6 \}\]
i.e., \(\Omega = \{(1,1), (1,2), \ldots (1,6), (2,1), (2,2), \ldots, (6,6) \}\)
An event is a collection of outcomes to which we want to assign a probability. (A subset of the sample space.)
Examples:
An event space \(S\) is a set of events composed in a particular way (for technical reasons):
A probability measure is a function \(P : S \rightarrow \mathbb{R}\) that assigns a probability to every event in the event space.
Kolmogorov axioms: \((\Omega, S, P)\) is a probability space if it satisfies the following:
\[P(A_1 \cup A_2 \cup A_3 \cup \ldots ) = P(A_1) + P(A_2) + P(A_3) + \ldots = \sum_i P(A_i) \] (for events that cannot co-occur, the probability of one of the event occurring is the sum of the individual probs)
Let \((\Omega, S, P)\) be a probability space. Then
Let’s prove it! (on board, also see A&M page 8, or slide notes by pressing s)
Goal: a strong system of understanding; every statement follows from
Every statement supported, no unnecessary assumptions.
There are “rules” (e.g. addition rule) and “laws” (e.g. law of total probability) but no one made them (directly).
Def 1.1.5: For \(A, B \in S\), the joint probability of \(A\) and \(B\) is the probability that both \(A\) and \(B\) happen, i.e. \(P(A \cap B)\)
Addition rule: For \(A, B \in S\),
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B)\] Let’s prove it! (Board, or book or S)
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B)\]
Assumptions behind this use of Venn diagram:
Def 1.1.8: For \(A, B\) with \(P(B) > 0\), the conditional probability of \(A\) given \(B\) is
\[P(A \mid B) = \frac{P(A \cap B)}{P(B)}\]
Read \(P(A \mid B)\) as “probability of \(A\) given \(B\)”.
For \(A, B\) with \(P(B) > 0\),
\[P(A \cap B) = P(B) P(A | B)\]
Here, product rule follows from definition of conditional probability; in logical approach (Cox’s Theorem) it follows from consistency axioms.
If \(A_1, A_2, \ldots \in S\) is nonempty and pairwise disjoint, and \(\Omega = A_1 \cup A_2 \cup \ldots\), then \(A_1, A_2, \ldots\) is a partition of \(\Omega\).
If \(\{A_1, A_2, \ldots \}\) is a partition of \(\Omega\) and \(P(A_i) > 0 \, \forall \, i\), then
\[P(B) = \sum_i P(B\cap A_i) = \sum_i P(B \mid A_i) P(A_i)\]
Definition: Events \(A, B \in S\) are independent if \(P(A \cap B) = P(A)P(B)\)
Informally, knowing \(A\) occurs does not tell you anything about whether \(B\) occurs.
Theorem 1.1.16 For \(A, B \in S\) with \(P(B) > 0\), \(A\) and \(B\) are independent (i.e. \(A \perp \!\!\! \perp B\)) if and only if \(P(A \mid B) = P(A).\)
Proof:
\(A \perp \!\!\! \perp B\) \(\iff P(A \cap B) = P(A)P(B)\) (definition)
\(A \perp \!\!\! \perp B\) \(\iff P(A \mid B) P(B) = P(A)P(B)\) (product rule)
\(A \perp \!\!\! \perp B\) \(\iff P(A \mid B) = P(A).\) \(\,\, \blacksquare\) (divide by \(P(B)\))
In subjective terms, knowing \(B\) occurred does not affect our assessment of probability that \(A\) occurred.
(Recall: events \(A\) and \(B\) independent means \(P(A \cap B) = P(A)P(B)\) and \(P(A|B) = P(A)\))
Pick a student at random from the university. Are \(A\) and \(B\) independent in the following examples?